MLOps: Deploying Models into Production

Myles Mitchell @ Jumping Rivers

Welcome!

  • Slides: URL

  • GitHub: URL

  • Access the virtual environment:

    • URL: URL

    • Master password: PASSWORD

Check in using the QR code! QR code for morning session check-in

Cloud Environment

URL

Password: PASSWORD

Screenshot of log in page

Before we start…

Who am I?

  • Background in Astrophysics.

  • Data Scientist @ Jumping Rivers:

    • Python & R support for various clients.

    • Teach courses in Python, R, SQL, Machine Learning.

  • Hobbies include hiking and travelling.

Jumping Rivers

↗ jumpingrivers.com   𝕏 @jumping_uk

  • Machine learning
  • Dashboard development
  • R packages and APIs
  • Data pipelines
  • Code review
     

Introduction to MLOps

Let’s take a step back…

The typical data science workflow:

Typical data science workflow. Starting with data importing and tidying, followed by a cycle of data transformation, data visualisation and modelling which repeats as the model is better understood. The results from this cycle are then communicated.
  • Data is imported and tidied.
  • Cycle of data transformation, data visualisation and modelling.
  • The cycle repeats as we understand the underlying model better.
  • The results are communicated to an external audience.

From Classical Stats to Machine Learning

  • The classical workflow prioritises understanding the system behind the data.
  • By contrast, Machine Learning prioritises prediction.
  • As data grows we reconsider and update our ML models to optimise predictive power.
  • A goal of MLOps is to streamline this cycle.

What is MLOps?

MLOps: Machine Learning Operations

MLOps workflow. Starting with data importing and tidying, followed by modelling and finishing with model versioning, deployment and monitoring. The cycle then repeats as more data is acquired.
  • Framework to continuously build, deploy and maintain ML models.
  • Encapsulates the “full stack” from data acquisition to model deployment.
  • Monitor models in production and detect “model drift”.
  • Versioning of models and data.

MLOps frameworks

  • Amazon SageMaker
  • Microsoft Azure
  • Google Cloud Platform
  • Vetiver by Posit (free to use and integrates with Posit Connect)

     

Vetiver

  • Opensource tool maintained by Posit PBC.
  • Integrates with ML libraries in R and Python.
  • Fluent tooling to version, deploy and monitor a trained model.
  • Supports deploying models to localhost - great way to learn MLOps!

Your first MLOps pipeline

Let’s build an MLOps stack!

  • Data
  • Model
  • Training
  • Deployment
  • Monitoring
  • Repeat

Data acquisition

Data tidying

Data validation

Data versioning

Task 1: Preparing your data

  • Open task1.txt

  • Adjust the validation code with the correct column types

  • Run the code, passing in the lemur.csv data

  • Not an R user? The solution can be found in task1_solutions.R

  • You have just built a data validation pipeline!

10:00

Model selection

Model training

Model versioning

Task 2: Training your model

  • Open task2.txt

  • Run your solution to task 1 to prepare the data

  • Pass this into the train() function

  • Run assess() with the unseen test data to score the model

  • Save these metrics along with the model in an RDS file

  • Your pipeline now includes model training and scoring!
10:00

Deployment

Task 3: Deploying your model

  • Open task3.txt

  • Upload your RDS file to the cloud

  • Obtain your model endpoint URL

  • Make a prediction using a GET request

  • Your model is in production! But we’re not finished yet…
10:00

Monitoring your model

Deployment is just the beginning…

Why should I care?

Some case studies…

Monitoring

Model drift

Retraining

Task 4: Detecting model drift

  • Open task4.txt

  • The lemurs_new.csv contains the latest version of data

  • Run your predicti

  • Your model is in production! But we’re not finished yet…
10:00

The future of MLOps

Real-time data

Thanks for listening!